NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Beyond Raw Bytes: Towards Large Malware Language Models

Kurlandski, Luke; Berger, Harel; Pan, Yin; Wright, Matthew (February 2026, NDSS 2026)

Malware poses an increasing threat to critical computing infrastructure, driving demand for more advanced detection and analysis methods. Although raw-binary malware classifiers show promise, they are limited in their capabilities and struggle with the challenges of modeling long sequences. Meanwhile, the rise of large language models (LLMs) in natural language processing showcases the power of massive, self-supervised models trained on heterogeneous datasets, offering flexible representations for numerous downstream tasks. The success behind these models is rooted in the size and quality of their training data, the expressiveness and scalability of their neural architecture, and their ability to learn from unlabeled data in a self-supervised manner. In this work, we take the first steps toward developing large malware language models (LMLMs), the malware analog to LLMs. We tackle the core aspects of this objective, namely, questions about data, models, pretraining, and finetuning. By pretraining a malware classification model with language modeling objectives, we were able to improve downstream performance on diverse practical malware classification tasks on average by 1.1% and up to 28.6%, indicating that these models could serve to succeed raw-binary malware classifiers.
more » « less
Free, publicly-accessible full text available February 24, 2027
ProTaxoVis—protein taxonomic visualisation of presence

https://doi.org/10.1186/s12859-025-06146-9

Hsieh, Yin-Chen; Bockwoldt, Mathias; Heiland, Ines (December 2025, BMC Bioinformatics)

Abstract Background:Protein presence information is an essential component of biological pathway identification. Presence of certain enzymes in an organism points towards the metabolic pathways that occur within it, whereas the absence of these enzymes indicates either the existence of alternative pathways or a lack of these pathways altogether. The same inference applies to regulatory pathways such as gene regulation and signal transduction. Protein presence information therefore forms the basis for biological pathway studies, and patterns in presence-absence across multiple organisms allow for comparative pathway analyses. Results:Here we present ProTaxoVis, a novel bioinformatic tool that extracts protein presence information from database queries and maps it to a taxonomic tree or heatmap. ProTaxoVis generates a large-scale overview of presence patterns in taxonomic clades of interest. This overview reveals protein distribution patterns, and this can be used to deduce pathway evolution or to probe other biological questions. ProTaxoVis combines and filters sequence query results to extract information on the distribution of proteins and translates this information into two types of visual outputs: taxonomic trees and heatmaps. The trees supplement their topology with scaled pie-chart representations per node of the presence of target proteins and combinations of these proteins, such that patterns in taxonomic groups can easily be identified. The heatmap visualisation shows presence and conservation of these proteins for a user-determined set of species, allowing for a more detailed view over a larger group of proteins as compared to the trees. ProTaxoVis also allows for visual quality checks of hits based on a coverage plot and a length histogram, which can be used to determine e-value and minimum protein length cutoffs. Tabular output of resulting data from the query, combined, and heatmap building step are saved and easily accessible for further analyses. Conclusions:We evaluate our tool with the phosphoribosyltransferases, a transferase enzyme family with notable distribution patterns amongst organisms of varying complexities and across Eukaryota, Bacteria, and Archaea. ProTaxoVis is open-source and available at:https://github.com/MolecularBioinformatics/ProTaxoVis.
more » « less
Free, publicly-accessible full text available December 1, 2026
CETD, a global compound events detection and visualisation toolbox and dataset

https://doi.org/10.1038/s41597-025-04530-x

Yin, Cong; Ting, Mingfang; Kornhuber, Kai; Horton, Radley M; Yang, Yaping; Jiang, Yelin (December 2025, Scientific Data)

Abstract Compound events (CEs) are attracting increased attention due to their significant societal and ecological impacts. However, their inherent complexity can pose challenges for climate scientists and practitioners, highlighting the need for a more approachable and intuitive framework for detecting and visualising CEs. Here, we introduce the Compound Events Toolbox and Dataset (CETD), which provides the first integrated, interactive, and extensible platform for CE detection and visualisation. Employing observations, reanalysis, and model simulations, CETD can quantify the frequency, duration, and severity of multiple CE types: multivariate, sequential, and concurrent events. It can analyse CEs often linked to severe impacts on human health, wildfires, and air pollution, such as hot-dry, wet-windy, and hot-dry-stagnation events. To validate the performance of CETD, we conduct statistical analyses for several high-impact events, such as the 2019 Australian wildfires and the 2022 European heatwaves. The accessibility and extensibility of CETD will benefit the broader community by enabling them to better understand and prepare for the risks and challenges posed by CEs in a warming world.
more » « less
Free, publicly-accessible full text available December 1, 2026
Learning coarse-grained dynamics on graph

https://doi.org/10.1016/j.physd.2025.134801

Yu, Yin; Harlim, John; Huang, Daning; Li, Yan (November 2025, Physica D: Nonlinear Phenomena)

Free, publicly-accessible full text available November 1, 2026
Transient thermal analysis of composites containing spherical inhomogeneities for the particle size effect on laser flash measurements

https://doi.org/10.1016/j.ijsolstr.2025.113540

Wu, Chunlin; Yin, Huiming (October 2025, International Journal of Solids and Structures)

Free, publicly-accessible full text available October 1, 2026
vinylogous imidonaphthoquinone [2+2] photocycloadditions to bridged aza-anthraquinones

Merski, Ian; Yin, Jinya; Vanderlindin, Ryan T; Rainier, Jon D (October 2025, Tetrahedron)

Free, publicly-accessible full text available October 31, 2026
Superheavy dark matter from the natural inflation in light of the highest-energy astroparticle events

https://doi.org/10.1088/1475-7516/2025/10/109

Murase, Kohta; Narita, Yuma; Yin, Wen (October 2025, Journal of Cosmology and Astroparticle Physics)

Abstract Superheavy dark matter has been attractive as a candidate of particle dark matter. We propose a “natural” particle model, in which the dark matter serves as the inflaton in natural inflation, while decaying to high-energy particles at energies of 10⁹-10¹³GeV from the prediction of the inflation. A scalar field responsible for diluting the dark matter abundance revives the natural inflation either with or without the recent data from the Atacama Cosmology Telescope (ACT) and baryon acoustic oscillation results from Dark Energy Spectroscopic Instrument.Since the dark matter must be a spin-zero scalar, we carefully study the galactic dark matter 3-body decay into fermions and two body decays into a gluon pair, and point out relevant multi-messenger bounds that constrain these decay modes. Interestingly, the predicted energy scale may coincide with the AMATERASU event and/or the KM3NeT neutrino event, KM3-230213A. We also point out particle models with dark baryon to further alleviateγ-ray bounds. This scenario yields several testable predictions for the UHECR observations, including the highest-energy neutrons that are unaffected by magnetic fields, the tensor-to-scalar ratio, the running of spectral indices,α_s≳ 𝒪(0.001), and the existence of light new colored particles that could be accessible at future collider experiments.Further measurements of high-energy cosmic rays, including their components and detailed directions, may provide insight into not only the origin of the cosmic rays but also inflation.
more » « less
Free, publicly-accessible full text available October 1, 2026
Neuropathology-based approach reveals novel Alzheimer's Disease genes and highlights female-specific pathways and causal links to disrupted lipid metabolism: insights into a vicious cycle

https://doi.org/10.1186/s40478-024-01909-6

Jin, Yin; Topaloudi, Apostolia; Shekhar, Sudhanshu; Chen, Guangxin; Scott, Alicia Nicole; Colon, Bryce David; Drineas, Petros; Rochet, Chris; Paschou, Peristera (December 2025, Acta Neuropathologica Communications)

Free, publicly-accessible full text available December 1, 2026
Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework

https://doi.org/10.1145/3743681

Yin, Minglei; Liu, Bin; Gong, Neil Zhenqiang; Li, Xin (September 2025, ACM Transactions on Management Information Systems)

With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications, such as e-commerce and social media, where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic studies on how to design defense strategies against visual attacks on VARS. In this article, we attempt to fill this gap by proposing anadversarial image denoising and detectionframework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized bylocalperturbations by image denoising based onglobalvision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can bejointlytrained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. Our approach is uniquely tailored for VARS, addressing the distinct challenges in scenarios where adversarial attacks can differ across industries, for instance, causing misclassification in e-commerce or misrepresentation in real estate. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method demonstrates high accuracy in detecting adversarial examples, complementing its robustness across various types of adversarial attacks.
more » « less
Free, publicly-accessible full text available September 30, 2026
IRIDIUM(IV) OXIDE-BASED CATALYSTS IN METHANE OXIDATION REACTIONS

Hsiao, Li-Yin (August 2025, University of Florida Electronic Theses and Dissertations)

Free, publicly-accessible full text available August 15, 2026

« Prev Next »

Search for: All records